System, method and computer program product for publishing interactive web content as a statically linked web hierarchy

ABSTRACT

With a client connected to a server, an agent tool simulates user interaction and traversal of dynamic web pages, causing the server-side processes to serve HTML pages to the client. As these pages are served, they are collected by the agent, modified to include static hyperlinks to replace the server side directed navigation logic, and then persistently stored in local files where they are available for off-line navigation by a browser without the need for accessing the server(s). In effect, the HTML pages are published.

BACKGROUND OF THE INVENTION

1. Technical Field of the Invention

This invention pertains to Web technology. More particularly, it pertains to resolving and storing dynamic links as static links for publishing web content.

2. Background Art

Off-line Web browsers download web pages, and all associated graphics or the like, and save them on local media, such as a client hard drive, for viewing off-line. Many of these browsers provide the capability to the a user to specify a “depth” (maximum number of recursive links that are to be followed) and will follow links from the pages the user specifies. This is useful for users with limited time on-line, or who want to save a particularly good web page, graphics and all.

Available in the art are many such client web caching programs. These include, for example, GetBot, Robo Surfer, Web Buddy, WebCopy99, WebZip, and Surf Express. In addition, many web servers have a server side cache function that operates in much the same manner as these client-side web caching programs.

However, none of these caching programs provide the ability to follow data driven dynamic links, those links that are derived by executing some logic on the server, possibly in conjunction with parameters passed from user interaction with a web page, and to modify the original page to contain all of the necessary information, including static links or Javascript, to access the pages that have been followed.

It is an object of the invention to provide a system and method for publishing dynamically linked, interactive content as a statically linked web hierarchy at a client side process.

It is a further object of the invention to provide a system and method for discovering the structure of a web site and convert any and all dynamically generated content into static pages.

It is a further object of the invention to provide a system and method for modifying function components in dynamically linked, interactive web page content to provide equivalent behaviors at a client without server side transaction processing.

It is a further object of the invention to provide a system and method for publishing highly interactive web content to a distributable media, thereby eliminating the need for a server or network connection.

It is a further object of the invention to provide a system and method for interacting with highly interactive web content when in a disconnected mode or in an area of the world where network infrastructure requires distribution on local media.

It is a further object of the invention to provide a system and method for publishing the content of HTML pages dynamically generated by a web server based on user interaction as if it was retrieved interactively and making the resulting content available via local media.

It is a further object of the invention to provide a system and method for accessing the content of HTML pages dynamically generated by a web server based on user interaction, without being connected to the server.

It is a further object of the invention to provide a system and method for publishing web content to a CDROM or other client based storage medium and for accessing that content through any non-connected computer browser.

It is a further object of the invention to provide a system and method for following dynamic links, those that rely on server side Java or Common Gateway Architecture (CGA) program logic.

It is a further object of the invention to provide a system and method with the ability to follow data driven dynamic links and modify the original page to contain all of the necessary information to access the followed pages.

It is a further object of the invention to provide a system and method which enables a client web caching program to follow data driven dynamic links, those links that are derived by executing some logic on the server in conjunction with some parameters passed from user interaction with a web page, and modify the original page to contain all of the necessary information, including static links, Javascript and the like, to access the followed pages.

It is a further object of the invention to provide a system and method for transforming a set of Hyper-text Markup Language (HTML) that requires server interaction to a set of HTML that does not require server interaction.

SUMMARY OF THE INVENTION

In accordance with the method and system of the invention, a Hypertext Markup Language (HTML) web page is parsed by an agent to identify dynamic links, those that require the server to generate a next set of HTML. These dynamic links are then replaced with computed static representations in one or more files in persistent storage where they are available to a browser.

In accordance with an aspect of the invention, there is provided a computer program product configured to be operable to replace dynamic HTML links with computed static representations.

Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a high level system diagram of a server/client system including an agent in accordance with the preferred embodiment of the invention.

FIG. 2 is a diagrammatic representation of a web page illustrating one-to-one mapping of dynamic to static links.

FIG. 3 is a diagrammatic representation of a server generated web page illustrating selection combinations and of a corresponding agent generated web page which preserves the look and feel of the original server generated web page.

FIG. 4 is a diagrammatic representation of an agent generated web page based on the server generated web page of FIG. 3 with modified look and feel.

FIG. 5 is an illustration of a plurality of hierarchical, linked web pages.

FIG. 6 is a flow diagram of the method of the invention for preserving server generated web page look and feel.

FIG. 7 is a flow diagram of the agent executed method of the invention, generic to maintaining or altering server generated web page look and feel.

BEST MODE FOR CARRYING OUT THE INVENTION

Referring to FIG. 1, the system of the preferred embodiment of the invention is shown. Client 20 is in communication with at least one server 22, or a plurality of servers including server 24. The method of the invention is implemented within agent 30, which resides as shown at client 20, but may also be server based (not shown). Agent 30 generates in a plurality of data and/or logic files 32 static representations of dynamic server based links. With client 20 disconnected from servers 22, 24, browser 34 accesses files 32 to simulate user interaction and traversal of dynamic web pages by following these static representations of the dynamic links.

In accordance with the preferred embodiment of the method of the invention, with client 20 connected to server 22, agent tool 30 simulates user interaction and traversal of dynamic web pages, causing the server-side processes 22, 24 to serve HTML pages to client 20. As these pages are served, they are collected by agent 30, modified to include static hyperlinks to replace the server side directed navigation logic, and then persistently stored in files 32 where they are available for off-line navigation by browser 34 without the need for servers 22, 24. In effect, the HTML pages are published.

Referring to FIGS. 2 and 3, two (of many) types of web pages of interest to the invention are illustrated.

Referring to FIG. 2, the situation involving a one to one mapping of a dynamic link to a static link is illustrated. This web page asks a user to select among a plurality of possible answers, including A, B and C. As shown in this example, the response “ONLY A” is the only correct response, the others being incorrect. Consequently, response “ONLY A” is linked by link 79 to a page 78 for displaying a correct response to the user, and the others are linked by link 77 to a page 76 for an incorrect response. These links 77, 79 are dynamic links that are resolved at server 22. Agent 30 asks the server for the resolution and hard codes the corresponding links 77, 79 it receives as static links in the HTML stored in files 32. Thus, the server side derived link 77, 79 are replaced with corresponding hard coded links 77, 79 as server generated response pages 76, 78 are copied into local files 32. For example, server 22 knows that if the user checks option ONLY A, a dynamic link goes to correct response page 78 for display back to the user, and otherwise the dynamic link goes to wrong answer page 76. Agent 30 HTML sends a post to server 22 that it needs resolution of the link, and that link resolution is replaced by agent 30 in files 32 with a link to the right answer page in files 32.

Referring to FIG. 3, a web page is illustrated for the case where there is a decision that needs to be made. The right answer is represented by a combination of selections. Any combination of null, A, B, C and D are possible responses, and any one or more combination of these possible responses may be correct response. Correct responses grow as 2 to the power of the number (N) of possible selections (and if null is not a possible response, that value 2^(N)−1), and include only A, only B, only C, only D, A and B, A and C, A and D, B and C, B and D C and D, and so forth. (All possible 2^(N) responses, including null, are shown in FIG. 4.) FIG. 3 represents both a server based web page (one generated without reference to the agent of the invention), and a client based web page resulting from agent 30 processing which maintains the look and feel of the original, server based web page. Maintaining “look and feel” refers, in this case to keeping the ABCD responses as shown in FIG. 3, rather than replacing the server generated display of FIG. 3 with a display presenting all of the possible combinations that could be converted to static, as is done in FIG. 4.

By way of a first example, generating a client based web page such as is illustrated in FIG. 4, without maintaining the look and feel, is done by opening a url connection to server 22; getting back as a response a set of HTML (web pages); scanning these HTML for other references to that server 22; opening up a url connection for each reference to a server; getting back more responses—more HTML; recursively doing that for each reference until there are no more remaining; for each request back, writing out into a flat file 32 corresponding web pages with static links which maintain the structure of url references in the original HTML.

By way of a second example, generating a client based web page such as is illustrated in FIG. 3, without changing look and feel (that is, without changing the number of possible selections displayed), is done in much the same way as in the first example, except that as results are received from server 22, each potential input is simulated to derive all potential outcomes. These results may be collected in a truth table 73, represented or implemented as an array, or as a linked list, flat file, or hash table—such as in Javascript or some other client 20 object in, for example, main storage (not shown). These are linked in turn to response pages 75 in local files 32. (Java is described at http://www.javasoft.com, Javascript is described at http://devedge.netscape.com and CGI is described at http://www.cgi-perl.com.)

If, in example 1, a truth table is used to link answers to response pages, it is not required that the links be persistent in the HTML. Rather, as is illustrated in FIG. 4, permutations are assigned new display selection options that represent all the possible permutations. An example of such a truth table is set forth in Table 1. TABLE 1 TRUTH TABLE A B C D A 0 0 0 0 B 0 0 1 0 C 0 1 0 0 D 0 0 0 0

Above example truth table indicates that answers of C & B is correct, all others are incorrect. The client side javascript examines the answer the user provides and compares that answer to the valid values provided in the truth table and make a determination on the path to be followed. The truth table is built by agent 30 during processing and is inserted into the resulting client side, generated web content file 32.

Referring to FIG. 4, the look and feel of the web page is changed to list all possible responses. Alternatively, agent 30 may replace server 22 side logic with some client 20 side logic (such as an applet) to emulate what the server side does. Such an applet handles specific types of questions and server side logic, such as multiple choice, single response; multiple choice, multiple response; matching; grid; and so forth. Agent 30 provides parameters to the server imbedded in the generated web content to have the server provide the client side logic.

Referring to FIG. 5, a hierarchy of web pages is illustrated. By way of example, page 50 is served by server 22, and contains links to pages 52 and 54, also served by server 22. Page 54, corresponding for example to page 70, contains links to pages 62, 64, and 66, with, for example, page 64 corresponding to response page 76 and page 66 corresponding to response page 78. Page 62 contains a link to page 68 which, in this example, is served by server 24. Pages 52 and 54 are at a depth of 1, pages 62, 64 and 66 are at a depth of 2, and page 68 is at a depth of 3 below the original, or parent page 50. As servers 22 and 24 serve these pages to client 20 in response to requests from agent 30, each page is put in a separate file 32, and the dynamic links to servers 22, 24 replaced with static links between corresponding pages in files 32.

In an alternative embodiment, page 54 may correspond to page 72, and pages 64 and 66 to response pages 75. Response truth table 73 is a location in main storage temporarily used for creating links from page 72 to response pages 75.

Referring to FIG. 6, a first embodiment of the method executed by agent 30 is set forth. Assume input from the user is to w3.ibm.com/hr/index.html, where server 22 is represented by w3.ibm.com, and its IP address is 9.243.100.100.

In step 80, client 20 gets the IP address 9.243.100.100 of server w3.ibm.com 22.

In step 82, next url variable is set to the entire address w3.ibm.com/hr/index.html.

In step 84, a url connection is opened to next url.

In step 86, agent 30 requests and stores the content of that url connection to memory.

In step 88, agent 30 parses through the contents of memory, retrieving and collecting all references to server 22 or IP addresses or other addresses of other servers 24.

In step 90, steps 84-88 are repeated through all references, with agent 30 requesting of servers 22, 24 and storing content served back in new local files 32, and updating next url for each iteration through steps 84-88.

In step 92, agent 30 processes the stored files 32, replacing server references (links to servers 22, 24) with local file references (links to other files within the collection of files 32).

The result is a set of local files 32 referencing each other, rather than the server(s) 22, 24.

Referring to FIG. 7, an alternative implementation of the method executed by agent 30 which allows either changing or preserving the look and feel of the web pages served to browser is illustrated.

In step 100, agent 30 gets the IP address of server 22.

In step 102, next url is set equal to source file w3.ibm.com/hr/index.html.

In step 104, a url connection is opened to next url.

In step 106, agent 30 requests and stores the content of that url to memory.

In step 108, agent 30 parses through the contents of memory, identifying (tagging) all references to server 22 (that is, url w3.ibm.com/hr/index.html or ip address 9.243.100.100, and addresses of other servers 24).

In step 110, agent 30 processes url content in stored files 32 by replacing server references with local file 32 references or calls to local logic.

In step 112, steps 104-110 are repeated for all references, updating next url for each iteration.

The create local logic step 110 requires that agent 30 keep a truth table 73 or equivalent representation of all the different combinations of answers A, B, C and D, and a link to the page to which they are resolved. This allows the look and feel to be preserved in the case where there are multiple inputs, as is illustrated in FIG. 3. Agent 30 need only understand where server 22 resolves the link, not the actual logic, provided the link is deterministic. (In a non-deterministic case, the result is a snapshot, so primary usefulness of the method of the invention is in deterministic cases; that is, in those cases where the servers are not rapidly changing their dynamic links.) An example of such deterministic usage is in a distance learning system, where correct responses are determined, and do not change as a function of time or some other such variable.

ADVANTAGES OVER THE PRIOR ART

It is an advantage of the invention that there is provided a system and method for publishing dynamically linked, interactive content as a statically linked web hierarchy at a client side process.

It is a further advantage of the invention that there is provided a system and method for discovering the structure of a web site and convert any and all dynamically generated content into static pages.

It is a further advantage of the invention that there is provided a system and method for modifying function components in dynamically linked, interactive web page content to provide equivalent behaviors at a client without server side transaction processing.

It is a further advantage of the invention that there is provided a system and method for publishing highly interactive web content to a distributable media, thereby eliminating the need for a server or network connection.

It is a further advantage of the invention that there is provided a system and method for interacting with highly interactive web content when in a disconnected mode or in an area of the world where network infrastructure requires distribution on local media.

It is a further advantage of the invention that there is provided a system and method for publishing the content of HTML pages dynamically generated by a web server based on user interaction as if those pages were retrieved interactively and making the resulting content available via local media.

It is a further advantage of the invention that there is provided a system and method for accessing the content of HTML pages dynamically generated by a web server based on user interaction, without being connected to the server.

It is a further advantage of the invention that there is provided a system and method for publishing web content to a CDROM or other client based storage medium and for accessing that content through any non-connected computer browser.

It is a further advantage of the invention that there is provided a system and method for following dynamic links, those that rely on server side Java or CGA program logic.

It is a further advantage of the invention that there is provided a system and method with the ability to follow data driven dynamic links and modify the original page to contain all of the necessary information to access the followed pages.

It is a further advantage of the invention that there is provided a system and method which enables a client web caching program to follow data driven dynamic links, those links that are derived by executing some logic on the server in conjunction with some parameters passed from user interaction with a web page, and modify the original page to contain all of the necessary information, including static links, Javascript and the like, to access the followed pages.

It is a further advantage of the invention that there is provided a system and method for transforming a set of Hyper-text Markup Language (HTML) that requires server interaction to a set of HTML that does not require server interaction.

Alternative Embodiments

It will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. In particular, it is within the scope of the invention to provide a computer program product or program element, or a program storage or memory device such as a solid or fluid transmission medium, magnetic or optical wire, tape or disc, or the like, for storing signals readable by a machine, for controlling the operation of a computer according to the method of the invention and/or to structure its components in accordance with the system of the invention.

Further, each step of the method may be executed on any general computer, such as an IBM System 390, AS/400, PC or the like and pursuant to one or more, or a part of one or more, program elements, modules or objects generated from any programming language, such as C++, Java, Pl/1, Fortran or the like. And still further, each said step, or a file or object or the like implementing each said step, may be executed by special purpose hardware or a circuit module designed for that purpose.

Accordingly, the scope of protection of this invention is limited only by the following claims and their equivalents. 

1-10. (canceled)
 11. Method for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, comprising the steps of: getting the address of said server; setting a next locator value to correspond to the address of said server; opening a connection to said next locator value; requesting and storing to memory the content accessed by said next locator value; parsing through said memory to identify all references to locator values; repeating said opening, requesting and storing, and parsing steps for each said locator value while storing said content accessed by each said locator value to a corresponding local file; and processing content stored to said local files to replace said locator values with local file references.
 12. Method for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, comprising the steps of: getting the address of said server; setting a next locator value to correspond to the address of said server; opening a connection to said next locator value; requesting and storing to memory the content accessed by said next locator value; parsing through said memory to identify all references to locator values; processing content stored to memory to replace said locator values with local file references; and repeating said opening, requesting and storing, parsing, and processing steps for each said locator value.
 13. The method of claim 11, further comprising the step of publishing to a client based storage medium said local files with said locator values replaced with local file references for accessing though a browser not connected to said server.
 14. The method of claim 12, further comprising the step of publishing said content to a client based storage medium for accessing by a computer browser not connected to said server. 15-16. (canceled)
 17. Method for transforming a server set of hyper-text markup language requiring server interaction to a client set of hyper-text markup language not requiring server interaction, comprising the steps of: executing server logic on said server set responsive to user parameters served by a client based agent to generate said server set of hyper-text markup language; storing at said client said server set of hyper-text markup language; and replacing dynamic links in said server set with local file references to generate said client set of hyper-text markup language.
 18. An agent for transforming a server set of hyper-text markup language (HTML) requiring server interaction to a client set of HTML not requiring server interaction, comprising: said agent being operable for serving to said server user parameters for executing server logic on said server set to generate said server set of hyper-text markup language; a store at said client for storing said server set of hyper-text markup language; and said agent being further operable for replacing dynamic links in said server set with local file references to generate said client set of hyper-text markup language.
 19. System for resolving and storing dynamic links as static links, comprising a client agent for requesting and storing a server generated web page; parsing said server generated web page to identify said dynamic links; and replacing said dynamic links with static links in a local file corresponding to said web page.
 20. The system of claim 19, said client agent further storing said local file to persistent storage; and further comprising a client browser for accessing said local file.
 21. The system of claim 19, said client agent being operable for iteratively requesting and storing, parsing, and replacing dynamic links with static links in each of a plurality of server generated web pages; and further comprising a local store for storing said plurality of web pages as a collection of statically linked web pages available to a client browser without further reference to said server.
 22. The system of claim 21, said local store being a persistent store.
 23. System for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, comprising: means for getting the address of said server; means for setting a next locator value to correspond to the address of said server; means for opening a connection to said next locator value; means for requesting and storing to memory the content accessed by said next locator value; means for parsing through said memory to identify all references to locator values; means for iteratively opening, requesting and storing, and parsing each said locator value while storing said content accessed by each said locator value to a corresponding local file; and means for processing content stored to said local files to replace said locator values with local file references.
 24. System for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, comprising: means for getting the address of said server; means for setting a next locator value to correspond to the address of said server; means for opening a connection to said next locator value; means for requesting and storing to memory the content accessed by said next locator value; means for parsing through said memory to identify all references to locator values; means for processing content stored to memory to replace said locator values with local file references; and means for repeating said opening, requesting and storing, parsing, and processing steps for each said locator value. 25-26. (canceled)
 27. A computer program product configured to be operable to replace dynamic hypertext markup language (HTML) links with computed static representations in accordance with the steps of: executing server logic on a server set of HTML links responsive to user parameters served by a agent to generate said server set of hyper-text markup language; storing said server set of hyper-text markup language; and replacing dynamic links in said server set with local file references to generate said client set of hyper-text markup language.
 28. A program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, said method steps comprising: getting the address of said server; setting a next locator value to correspond to the address of said server; opening a connection to said next locator value; requesting and storing to memory the content accessed by said next locator value; parsing through said memory to identify all references to locator values; repeating said opening, requesting and storing, and parsing steps for each said locator value while storing said content accessed by each said locator value to a corresponding local file; and processing content stored to said local files to replace said locator values with local file references.
 29. A program storage device readable by a machine, tangibly embodying a program of instructions executable by a machine to perform method steps for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, said method steps comprising: getting the address of said server; setting a next locator value to correspond to the address of said server; opening a connection to said next locator value; requesting and storing to memory the content accessed by said next locator value; parsing through said memory to identify all references to locator values; processing content stored to memory to replace said locator values with local file references; and repeating said opening, requesting and storing, parsing, and processing steps for each said locator value.
 30. The program storage device of claim 29, said method steps further comprising the step of publishing to a client based storage medium said local files with said locator values replaced with local file references for accessing though a browser not connected to said server.
 31. Method for accessing at a client the content of pages dynamically generated by a web server based on user interaction without said client being connected to said server, comprising the steps of: with said client connected to said server, simulating user interaction and traversal of dynamic web pages to cause server processes to serve web pages to said client; and collecting said web pages as they are served; modifying the collected web pages to include static hyperlinks to replace server side directed navigation logic; and persistently storing said collected web pages including said static hyperlinks in local files where they are available for off-line navigation by a client browser.
 32. A computer program product for accessing the content of pages dynamically generated by a web server based on user interaction without being connected to said server, said computer program product comprising: a computer readable storage medium; first program instructions for getting the address of said server; second program instructions for setting a next locator value to correspond to the address of said server; third program instructions for opening a connection to said next locator value; fourth program instructions for requesting and storing to memory the content accessed by said next locator value; fifth program instructions for parsing through said memory to identify all references to locator values; sixth program instructions for processing content stored to memory to replace said locator values with local file references; seventh program instructions for repeating said opening, requesting and storing, parsing, and processing steps for each said locator value; and wherein said first, second, third, fourth, fifth, sixth, and seventh program instructions are recorded on said computer readable storage medium.
 33. The computer program product of claim 32, further comprising: eighth program instructions for publishing to a client based storage medium said local files with said locator values replaced with local file references for accessing though a browser not connected to said server; and wherein said eighth program instructions are recorded on said computer readable storage medium.
 34. A computer program product for accessing at a client the content of pages dynamically generated by a web server based on user interaction without said client being connected to said server, said computer program product comprising: a computer readable storage medium; first program instructions for simulating user interaction and traversal of dynamic web pages to cause server processes to serve web pages to said client with said client connected to said server and collecting said web pages as they are served; second program instructions for modifying the collected web pages to include static hyperlinks to replace server side directed navigation logic; third program instructions for persistently storing said collected web pages including said static hyperlinks in local files where they are available for off-line navigation by a client browser; fourth program instructions for navigating said static hyperlinks at said client; and wherein said first, second, third, and fourth program instructions are recorded on said computer readable storage medium. 